Skip to content

Fix silent / misleading failures in paginated waterdata pagination#273

Draft
thodson-usgs wants to merge 1 commit intoDOI-USGS:mainfrom
thodson-usgs:fix-paginated-truncation-errors
Draft

Fix silent / misleading failures in paginated waterdata pagination#273
thodson-usgs wants to merge 1 commit intoDOI-USGS:mainfrom
thodson-usgs:fix-paginated-truncation-errors

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

Summary

Found while reviewing the repo for high-value bugfixes. Two latent bugs in _walk_pages and the parallel pagination loop in get_stats_data (dataretrieval/waterdata/utils.py) cause silent or misleading failures whenever a paginated waterdata request is interrupted mid-walk.

Bug 1 — resp is stale when client.request() itself raises

while curr_url:
    try:
        resp = client.request(...)              # ← may raise ConnectionError, Timeout, …
        dfs.append(_get_resp_data(resp, geopd))
        curr_url = _next_req_url(resp)
    except Exception:
        error_text = _error_body(resp)          # ← uses *previous* page's resp
        logger.error("Request incomplete. %s", error_text)
        ...

When the network call raises before resp is reassigned, resp still references the previous successful page. The log message thus describes the wrong response, and _error_body() falls into resp.json() on a 200 OK body that wasn't an error envelope — which can itself raise and bury the original failure.

Bug 2 — no status-code check on mid-pagination responses

The initial request gate at line 615 reads if resp.status_code != 200: raise RuntimeError(_error_body(resp)). The pagination loop does not. So a 5xx response with a JSON error body that lacks numberReturned flows into _get_resp_data(), which silently returns an empty DataFrame; _next_req_url() finds no next link; and the loop exits with no error logged anywhere. The caller gets truncated data and never knows.

This affects every paginated waterdata getter: get_daily, get_continuous, get_monitoring_locations, get_time_series_metadata, get_combined_metadata, get_field_measurements, get_field_measurements_metadata, get_peaks, get_channel, get_reference_table, plus the stats getters (get_stats_por, get_stats_date_range).

Fix

Two-line change to each loop:

if resp.status_code != 200:
    raise RuntimeError(_error_body(resp))

…immediately after client.request(...) (mirrors the initial-request gate). And:

except Exception as e:
    logger.error("Request incomplete: %s", e)

…instead of _error_body(resp), so the actual exception text surfaces — works for both the network-error case (uses the exception) and the 5xx case (uses the RuntimeError(_error_body(resp)) we just raised on a fresh resp).

The "best-effort" behavior is preserved: on failure the loop still logs and returns whatever pages were collected. The change is purely to make the failure observable and the log message accurate.

Test plan

  • test_walk_pages_logs_actual_exception_when_request_raises — simulates a ConnectionError mid-pagination, asserts the logged error reflects the actual exception ("boom") rather than something derived from a stale prior-page response.
  • test_walk_pages_surfaces_5xx_mid_pagination — page 2 returns HTTP 503 with a JSON body, asserts the 503 / ServiceUnavailable shows up in the error log. Previously this was silently swallowed.
  • Existing test_walk_pages_multiple_mocked (happy path) still passes.
  • Full mocked test suite: 278 passed, ruff + format clean.

🤖 Generated with Claude Code

Two bugs in _walk_pages (and the parallel get_stats_data pagination
loop) caused silent or misleading failures when a paginated request was
interrupted mid-walk:

1. The except handler called _error_body(resp) — but when
   client.request() itself raised (ConnectionError, Timeout, etc.),
   `resp` still pointed at the *previous successful page*. The
   "incomplete" log message therefore described the wrong request, and
   on a non-JSON 200 body would itself raise inside resp.json() and
   bury the original failure.

2. No status-code check was performed on paginated responses. The
   initial request guards `if resp.status_code != 200`, but the loop
   doesn't. A 5xx body that didn't include "numberReturned" was
   silently turned into an empty DataFrame by _get_resp_data, the
   `next` link wasn't found, and the loop quietly exited — handing the
   user truncated data with no error logged anywhere. This affects
   every paginated waterdata getter (get_daily, get_continuous,
   get_monitoring_locations, …).

Fix:
- Guard the loop body with `if resp.status_code != 200: raise
  RuntimeError(_error_body(resp))`, mirroring the initial request.
- Capture the exception with `as e` and log `e` instead of touching the
  stale `resp`. Same change in get_stats_data.

Behavior preserved: on failure the loop still logs and returns
whatever pages were collected ("best effort"). The change is purely to
make the failure observable and the log message accurate.

Two new tests cover (a) network-exception mid-pagination and (b) 5xx
mid-pagination, asserting that the actual failure surfaces in the
error log.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant